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Genomic DNA copy number afteratlons are key genetic events in 
the development and progression of human cancers. Here we 
report a genome-wide ntlcroanray. comparative genomic hybrid- 
ization (array CGH) analysts of DNA copy number variation In 
a series of primary human breast tumors. We have profiled DNA 
copy number alteration across 6«691 mapped human genes. In 44 
predominantly advanced, primaiy breast tumors and 10 breast 
cancer celt lines. While the overall patterns of DNA amplification 
and deletion corroborate previous cytogenetic studies, the high* 
resolution (gene-by-gene) mapping of amplicon boundaries and 
the quantltathre analysis of amplicon shape provide significant 
improvement In the l9cali?ation of candMate oncogenes. Parallel 
microarray measurements, of mRNA levels reveal the remarkable 
degree to which variation in gene copy number contributes to 
variation in gene expression In tumor cells. Specifically, we find 
that 62% of highly amplified genes show moderately or highly 
elevated expression^ that PNA copy number inflMences gene ex* 
pression across a wide range of DNA copy number alterations 
(deletion, lovv-« mid- and high-level amplification), that on average, 
a Z-fold change In DNA copy number Is associated with a corre- 
sponding 1.5~fold change In mItNA levels, and that overall, at least 
12% of ail the variation In gene expression among the breast 
tumors b directly attributable to underiying variatioti In gene copy 
number. These findings provide evidence that widespread DNA 
copy number alteration can lead directly to global deregulation of 
gene expression, which may contribute to the development or 
progression of cancer. 

Conventional cytogenetic techniques, tncludinig comparative 
genomic hybridization (CXjH) (1), have led to the identifi- 
cation of a number of recurrent regions of DNA copy number 
alteration in breast catiber oelhlines and tumors (2-4). WhOe 
some of these' regions contain known or candidate oncogenes 
[e.g., FOFRl (8pll), MYC (8q24), CX:^! (Ilql3), ERBB2 
(17ql2), and ZNF217 (20ql3)] and tumor suppressor genes 
(RBI (13ql4) and TP53 (17pl3)], the relevant gene(s) within 
other regions (e.g,, gain of Iq, 8q22, and 17q22-24, and loss of 
8p) remain to be identified. A high-resotution genome-wide 
map, delineating the boundaries of DNA copy number alter- 
ations in tumors, should fadlitate the localization and identifi- 
cation of oncogenes and tumor suppressor genes in breast 
cancer In this study, we have created such a map, using 
arr^-based CXjH (5-7) to profile DNA copy number alteration 
in a series of breast cancer cell lines and primary tumors. 

An unresolved question is the extent to which the widespread 
DNA copy number changes tfiat we and others have identified 
in breast tumors alter expression of genes within involved 
regions. Because we had measured mRNA levels in parallel in 
the same sample? (8), using the same DNA microarrays, we had 
an opportuniQr to explore on a genomic scale the relationship 
between DNA copy number dianges and gene expression. From 



this analysis, we have identified a significant impact of wide- 
spread DNA copy number alteration on the transcriptional 
programs of breast tumors. 

Materials and Methods 

Tumors and Cell Urns. Primary breast tumors were predomuiantly 
large (>3 cm), iiiteniaediate-grade, infiltrating ductal carcino- 
mas; with more th& being lymph node poative. The 
fraction of tumor cells within spechnens averaged at least 50%. 
Details of individual tumors have been published (8, 9), and 
are summarized in Table 1, which is published as supporting 
information on the PNAS web site, www«pnas.org. Breast cancer 
cell lines were obtained from the American Type Culture 
Collectioii. Genomic DNA was isolated either using Qiagen 
genomic DNA columns, or by phenol/cfalorofonn extraction 
followed ethanol precipitation. 

DNA Labeling and Microarray Hybridizations. Genomic DNA label- 
ing and hybridizations were performed essentially as described 
in Pollack et al. (7), with slight modifications. Two micrograms 
of DNA was labeled in a total volume of 50 microliters and the 
volumes of all reagents were adjusted accordingly. **Test" DNA 
(from tumors and cell lines) was f iuorescentty labeled (CyS) and 
hybridized to a human cDNA microarray containing 6,691 
different mapped human genes (i.e„ UniGene clusters). The 
''reference'' (labeled with Cy3) for each hybridization was nor- 
mal female leukocyte DNA from a single donor. The fabrication 
of cDNA microarrays and the labeling and hybridization of 
mRNA samples have been described (8). 

Data Analysis and Map PosHioas. Hybridized arrays were scanned 
on a GenePix,^nner (Axon Instruments, Foster City, CA), and 
fluorescence ratios (test/reference) calculated using scanalyze 
software (available at ht^)://rana.Ibl.gov)* Fluorescence ratios 
were normalized for eadi array by setting the average log 
fluorescence ratio for all array elements equal to 0. Measure- 
ments with fluorescence intensities more than 20% €Lbove back* 
ground were considered reliable. DNA copy number profiles 
that deviated significantiy from background ratios measured in 
normal genomk DNA control hybridizations were interpreted as 
evidence of real DNA copy number alteration (see BttUmtUtg 
Si^ificance of Altered Fluorescence Ratios in the supporting 
information). When indicated, DNA copy number profiles are 
displayed as a moving average (symmetric 5-nearest neighbors). 
Map positions for arrayed human cDNAs were assigned by 
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.Fl9* 1 - Genome^ide measurement of D NA copy number afteration by array CGK (a) DNA copy number profiles are ttlustrated for cell lines containing different 
. numbers of X chromosomes* for breast cancer cell lines, and for breast tumors. ead» row represents a different cell Ifne or tumor, and each column represents 
one of 6,691 different mapped human genes present on the microarray. ordered by genome map position from 1 pter through Xqter. Moving average (symmetric 
S-nearest netghbors) fluorescence ratios (test/reference) are depleted using a logi-based pseudocolor scale dedicated), such that red luminescence reflects 
fold-amplification, green luminescence reflects fold-deletion, and blacH indicates no change (gray indicates poorly measured data). (6) Enlarged view of DNA 
copy number profiles across the X chromosome* shown for cell lines containing different numbers of X chromosomes. 



identilging the starting position of the best and longest match of 
any DNA sequence represented in the corresponding UniGene 
cluster (10) against the **Go!den Path" genome assembly 
(http://genome.ucsc.edu/; Oct 7, 2000 Freeze). For UniGene 
clusters represented by muUipl^arr^ed elements, mean fluo^ 
rcsocncc ratios (for ail elements representing the same UniGene 
duster) are reported. For mRNA measurements, fluorescence 
ratios are "mean-centered" (ie„ reported relative to the mean 
ratio across the 44 tumor samples). The data set described here 
can be accessed in its entirety in the supporting information. 

Results 

We performed CGH on 44 predominantly locally advanced, 
primary breast tumors and 10 breast cancer cell lines, using 
cDNA microarrays containing 6,691 different mapped human 
genes (Fig. la; also see Materials and Methods for details of 
microarray hybridizations). To take full advantage of the im- 
proved spatial resolution of array CGH, we ordered (fluores- 
cence ratios for) the 6,691 cDNAs according to the '*Golden 
Path" (http://genomc.ucscedu/) genome assembly of the draft 
human genome sequences (11). In so doing, arrayed cDNAs not 
only themsehres represent genes of potential interest (e.g., 
candidate oncogenes within amplicons), but also provide precise 
genetic landmarks for chromosomal regions of amplification and 
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deletion. Parallel analysis of DNA from cell lines containing 
different numbers of X chromosomes (Fig. 16), as we did before 
(7), demonstrated the sensitivity of our method to detect sin^ 
copy loss (45, XO), and IS- (47,XXX), 2- (48,XXXX), or 
2.5-fold (49POCXXX) gains (also see Fig. 5, which is published 
ais supporting information on the PNAS web site). Fluorescence 
ratios were linearly proportional to copy number ratios, which 
were slightly underestimated, In agreement with previous ob- 
servations (7), Numerous DNA coj>y number alterations were 
evident in both the breast cancer cell lines and primary tumors 
(Fig. la), detected in the tumors despite the presfence of euploid 
non-tumor cell types; the magnitudes of the observed changes 
were generally lower in the tumor samples. DNA copy-number 
alterations were found hi every cancer cell line and tumor, and 
on every human chromosome in at least one sample. Recurrent 
regions of DNA copy ntmiber gain and loss were readily iden- 
tifiable* For' example, gains wtthm Iq, Bq, 17q, and 20q were 
observed in a high proportion of breast cancer cell lines/tumors 
(90%/69%. 100%/47%, 100%/60%, and 90%/44%, respective- 
fy), as were Josses within Ip, 3p, 8p, and jSq (80%/24%, 
80%/22%, 80%/22%, and 70%/18%, respective^), consistent 
with published cytogenetic studies (refe. 2-4; a complete listing 
of gains/losses is provided in Tables 2 and 3, which are published 
as supporting information on the PNAS web site). The total 
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Fig. 2. DNAcopy number alteration across chromosome 8 by array C6H. (a) ONAcopy number profiles are illustrated for cell lines containing different numbers 
of X chromosomes, for breast cancer cell lines, and for breast tumors. Breast cancer cell liri'es and tumors am separately ordered by hierardiical clustering to 
highlight recurrent copy number changes. The 241 genes present on the mlcroarrays and mapping to chromosome 8 are ordered by position along the 
chromosome. Fluorescence ratios (test/reference) are depicted by a log^ pseudocolor scale (Indicated). Selected genes are Indicated with color^coded text (red. 
increased; green* decreased; black, no change: gray, not well measured) to reflect correspondingly altered mRNA levels (observed in the majority of the subset 
of saipples displaying the DNA copy number chartge)' The map positions for genes of Interest that are not represented on the mioroarr^ are indicated in th« 
row above those genes represented on the array, (b) XSraphicai display of DNA copy number profile for breast cancer ceil line SKBR3. Fluorescence ratios 
^mor/normat)areplottedonalogi^KAlS^€b,gmosonw8'gene$iO^ ,^ ^ 



number of geoomic alteratioits (gains and losses) was found to 
be sigalficantly higher in breast tumois that were highg^e (P - 
0.008)» coQsistent with published CGH data (3), estrogen recep- 
tor negative {P 0.04), and harboring TP53 mutations (P - 
0.0(X)6) (see Table 4, which Is published as si^porting informa- 
tion on the PNAS web site). 

The improved spatial resolution of our array CGH analysis is 
illustrated for chromosome 8, which displayed extensive DNA 
copy number alteration in our series. A detailed view of the 
variation in the copy number of 241 genes mapping to chromo- 
some 8 revealed multiple regions of recurrent amplification; 
eadi of these potentially harbors a different known or previously 
uncharacterized oncogene (Fig. 2a). The complexly of ampHcdn 
structure is most easily appreciated in the breast cancer cell line 
SKBR3. Although a conventional CGH analysis of 8q in SKBR3 
identified only two distinct regions of amplification (12), we 
observed three distinct regions of high-level amplification (la- 
beled l->3 in Fig. 2b). For each of these regions we can define the 



boundaries of the Interval recurrently amplified in the tumors we 
examined; in each case, known or plausible candidate oncogenes 
can t>e identified (a description of these regions, as well as the 
recurrently amplified regions on diromosomes 17 and 20f can be 
found in Figs. 6 and 7, which are published as supporting 
information on the PNAS web site). 

For a subset of breast cancer cell lines and tumors (4 and 37, 
respectively)* and a subset of arrayed genes (6^5), mRNA 
levels were quantitatively measured in parallel oy using cDNA 
microarrays (8). The parallel assessment of mRNA levels is 
usefol in the mterpretation of DNA copy number changes. For 
example, the highly amplified genes that are also highly ex> 
pressed are the strongest candidate oncogenes withhi an ampli- 
con. Perhaps more significantly^ our parallel analysis of DNA 
copy number changes and mRNA levels jprovldes us the oppor- 
tunity to assess the global impact of widespread DNA copy 
number alteration on gene expression in tumor cells. 

A strong Influence of DNA copy number on gene expression 
is evident m an examination of the pseudocolor representations 




Pdiack etaA 



PNAS ) October1,2002 | vdl.9» j no.20 j 1296S 






mm 









i 








i ! : • 

f 


I 


: ; 

i 






t 




1 


i ' 


1 


I 




1 

1 




t 
















! 1 






1 


: r4H 'r^ \ 




J—— 

■ ■ .. 

^ ■ r 






r 








• 


i : 






> • ? 'i 




* .. . 




r ^ 






• 






• * 


1, 








; 


* 




; C 

r 






i": i.: 'i ! 




: -i ' 














■ : : 






■i "' 


■ • 1 


: : 


■V . ; i ^ ^ 


f 




i : 




if; 




i 

*• 


r . < J ' . 


s 




, \ 

t " r ' 






! 




■ \ 








■ 




* * * 


r 


i ■ 1 


! 

'i. 




i 


i: . 


i .1, } . 


• ■ f 


: r ' 

■■ .< 








! 






• •. { 








s 






t ?' 


r' 










" > !• . 






. -J 


I f 

>. i ; ' < 


.* { 




i! I 

V 'i V.': 

it 


i 


K 

+ 

i 


^ '. ' '• 

i *•! it' t 

■■.{■■ 

! 


f. 

I \ 

\ 


- r 


i 'I: { 



tsst/nef 

raitio: 




irpter 



I 



irqter 



Hg. 3, Concordance between DNA copy number and gene expression aaoss chromosome 17. DNA copy number alteration (Upper) and mRNA levels (Lower) 
are illustrated for breast cancer cell lines and tumors. Breast cancer cell lines and tumors are separately ordered by hierarchlcaf clustering {Uppei)* and the 
Identical sample order is maintained {Low&), The 354 genes present on the microarrays and mapping to chromosome 1 7, and for which both DNA copy number 
and mRNA leveb were determined, are ordered by position along the chromosome; selected genes are indicated tn color<oded text (see F{g. 2 legend). 
Fluorescerice ratios (test/reference) are depicted by separate iogz pseudocolor scales (indicated). 



of DNA copy number and mRNA levels for genes on chromo- 
some 17 (Fig. 3). The overall patterns of gene amplification and 
elevated gene expression iuc quite concordant; Le^ a significant 
fractiSh of hig^hfy amplified genes. apj^ift* to bc-^orrespondingt/ 
highly expressed. Tl^e concordanos between high4evel amplifi- 
cation and increased gene expression is not restricted to diro- 
mosome 17. Oenome-wide, of 117 high-level DNA amplifica- 
tions (fluorescence ratios >4,' and representing 91 different 
genes), 62% (representing 54 di^erent genes; see Table 5, which 
IS published as supporting information on the PNAS web site) 
are found associated with at least moderately elevated mRNA 
levels (mean-centered fluorescence ratios >2), and 42% (rep- 
resenting 36 different genes) are found associated with compa- 
rably highly elevated mRNA levels (mean«centered fluorescence 
ratios >A). 

To determine the extent to which DNA deletion and lower- 
level amplification (in addition to high-level amplification) are 
also associated with corresponding alterations In mRNA levels^ 
we performed three separate analyses on the complete data set 
(4 cell lines and 37 tumors, across 6,095 genes). First, we 
determined the average mRNA levels for each of five classes 
of genes, representing DNA deletion, no change, and low-, 
medium-, and high-level amplification (Fig. Aa), For both the 



breast cancer cell lines and tumors, average mRNA levels 
tracked with DNA copy niiniber across all five classes, in a 
statistically sigpificant &shion (P values for pur-wise Student's 
/ tests comparhig adjacent classes: cell lines, 4 x 10~^» 1 X 10'^, 
5 X Iflr^ 1 X 10-^; tumors, 1 X lO^, 1 "^X IQ-^i* 5 x lO"^*, 
1 X 10"% A lineiar regression of the average log(DNA copy 
number), for each das&t against average log(mRNA level) 
demonstrated that on average, a 2-fold diange in DNA copy 
number was accompan ied by 1 ,4- and 1 ^-fpld changes in mRNA 
level for the breast cancer cell lines and tumors, req>ectively (Ftg. 
4a, regression line not shown). Second, we characterized the 
distribution of the 6,095 correlations t)etween DNA copy num- 
ber and mRNA leveC each across the 37 tumor samples (Rg. 4^). 
The . distribution of correlations forms a normat-diaped carve, 
but with the peak markedly shifted in the positive direction from 
zero. This shift is statistically significant, as evidenced in a plot 
of observed vs. expected correlations (Fig* 4c), and reflects a 
pervasive global influence of DNA copy number alterations on 
gene expression. Notably, the highest correlations between DNA 
copy number and mRNA level (the right tail of the distribution 
in Fig. 4b) comprise both amplified and deleted genes (data not 
shown)* Third, we used a Imear regression model to estimate the 
fraction of all variation measured in mRNA levels among the 37 
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Fig. 4. Qenome^de influence of DNA copy number alterations on tnflNA levels. <a) For breast cancer cell lines (gray) and tumor samples (black), both 
mean<entefed mRNA fluorescence ratio (iogt scale) quartnes (box plots Indicate 25tti, SOth, and 7Sth percentile) and averages (dlainondii;-r-value error bars 
Indicate standard errors of the mean) are plotted for each of five dasses of gene«, representing DMA deletion (tumor/normal ratio < Ol8)« no change (0.8-1.2), 
low- (1.2-2). medium- (2-4). and higMevet (>4) amplification. P values for pair-wise Student's t tests, comparing averages between adjacent classes (moving 
(efttoHgh^/are 4 x lO-**, 1 x io-«, 5 x 10-*. i x 10'* (cell lines), and 1 X lO"^. 1 x 10-^*, 5 X \0'*\ 1 x lO^* (tumons). (6) Distribution of correlations between 
DNA copy number and mRNA levels, for 6.095 different human genes aaoss 37 breast tumor samples, (c) Plot of observed versus expected correlation coefficients. 
The expected vahies were obtained by randomization of the sample labels In the DNA copy number data set the line of unity Is indicated, {di Percent variance 
.in gene expression (among tumon) directly explained by variation in gene copy numt>er. Percent variance explained, (black line) and fraction of data retained 
(gray line) are plotted for different fluoreixence intensity /background (a rough surrogate for signal/noise) cutoff values. Fraction of data retained Is relative 
to the t.2 trrterwrty/background cutoff. Details of the linear regression model used to estimate the fraction of variation in gene expression attributable to 
underlying DNA copy number alteration can be found In the supporting information (see estimating the Fraction of Variation in Oene Expresston Attributable 
to Underlying ONA Copy Number Alteration), 



tumors that could be attributed to underlying variation in DNA 
copy number. From this analysis, we estimate that, overall, about 
7% of all of the observed variation in mRNA levels can be 
explained directly by variation in copy number of the altered 
genes (Fig. 4d). We can reduce the effects of experimental 
measurement error on this estimate fa^usingtfMly tbat fraction 
of the data most ^feliabfy measured (fhtorescence inteniity/ 
background >3); using that data, our estimate of the percent 
variation in mRNA levels directly attributed to variation in gene 
copy number increases to 12% (Fig. 4^. This still undoubtedly 
represents a significant underestimate, as the observed variation 
in global gene expression is affected not only by true variation in 
the expression programs of the tumor cells themselves, but also 
by the variable presence of non-tumor cell types within clinical 
samples. 

Discussion 

This genome^wide, array COH analysis of DNA copy number 
alteration in a series of human breast tumors demonstrates the 
Usefulness of defining ampKcon boundaries at high resolution 
(gene^by-gene), and quantitatively measuring amplicon shape, to 
assist in locating and identifying candidate oncogenes. By ana- 
lyzing mRNA levels in parallel, we have also discovered that 
changes in DNA copy number have a large, pervasive, direct 
effect on global gene expression patterns in both breast cancer 



cell lines and tumors. Although the DNA microarrays used in our 
analysis may display a bias toward characterized and/or highly 
expressed genes, because we are examinmg sudi a large fraction 
of the genome (approximately 20% of all human genes), and 
because, as detailed above, we tire likely underestunating the 
contribution of DNA copy number changes to altered gene 
expression,%e believe our findings are likely to be generalizable 
(but would nevertheless stiU be remarkable if on^ applicable to 
this set of --6,100 genes). 

In budding yeast, aneuploldy has been shown to result in 
chromosome-wide gene expression biases (13). Two recent 
studies have begun to examine the global relationship between 
DNA co{^ number and gene expression in cancer cells. In 
agreement with our findings, Phillips ei aL (14) have shown that 
with the acquisition of tumorigenidty in an immortalized pros- 
tate epithelial cell line, new chromosomal gams and losses 
resulted in a statistically significant respective increase and 
decrease in the average expression level of involved genes. In 
contrast, Plalzer et al. (15) recently reported that in metastatic 
colon tumors only —4% of genes within amplified regions were 
found more highly (>2-fold) expressed, when compared with 
normal colonic epithelium* This report differs substantially from 
our finding that 62% of highly amplified genes in breast cancer 
exhibit at least 2-fold increased expression. These contrasting 
findings may reflect methodological differences between the 
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studies. For example, the study of Platzer et at (IS) may have 
systematically under-measured gene expresdon changes. In this 
regard it is remarkable that only 14 transcripts of many thousand 
residing within unamplified chromosomal regions were found to 
exhibit at least 4-fold altered expression in metastatic colon 
cancer. Additionally, their reliance on lower-resolution chromo* 
somai CGH may have resulted in poorly delimiting the bound- 
aries of high-complexity amplicons, effectively overcalling re- 
gions with amplification* Alteraatwely, the contrasting flndhigs 
for amplified genes may represent real biological differences 
between breast and metastatic colon tumors; resolution of this 
issue will require further studies. 

Our finding that widespread DNA copy number alteration has 
a large, pervasive and direct effect on global gene expression 
patterns in breast cancer has several important implications. 
First, this finding supports a high degree of copy number- 
dependent gene expression in tumors. Second, it suggests that 
tnost genes are not subject to specific autoregulation or dosage 
compensation. Third, this finding cautions that elevated expres- 
sion of an amplified gene cannot alone be considered strong 
independent evidence of a candidate oncogene's role in tumor* 
igenesls. In our study, fiitty 62% of hi^ly amplified genes 
demonstrated moderately or highly elevated expression. Hiis 
highlights the importance of high-resolution'^nmpping of ampit^ 
con boundaries and shape [to identify the "driving** gene(s) 
within amplicons (16)], on a large number of samples, In addition 
to functional studies. Fourth, this finding suggests that analyzing 
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the genomic distribution of expressed genes, even within existing 
microarray gene expression data sets, may permit the inference 
of DNA copy number aberration, particularly aneuploidy (where 
gene expression can be averaged across large chromosomal 
regions; see Fig. 3 and supporting information). Fifth, this 
finding unplies that a substantial portion of the phcnotypic 
uniqueness (and by extension^ the heterogeneity in duiical 
behavior) among pati^ts' tumors may be traceable to undeily- 
ing variation hi DNA copy number. Sixth, this finding supports 
a possible role for widespread DNA copy number alteration in 
tumorigenesis.(17, 18), beyond the amplification of specific 
oncogenes . and deletion of spedfic tumor suppressor genes. 
Widespread DNA copy number alteration, and the concomitant 
widespread imbalance in gene expression, might disrupt critical 
stochiometric relationships in cell metabolism and physiology 
(e.g., proieosomc, mitotic spindle), possibly promoting further 
chromosomal instability and directly contributing to tumor 
development or progression* Finally, our finding^ suggest the 
possibili^ of cancer therapies that exploit specific or global . 
imbalances m gene expression m cancer. 
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